{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nAuto-tuning a convolutional network on VTA\n==========================================\n**Author**: `Lianmin Zheng <https://github.com/merrymercy>`_, `Thierry Moreau <https://homes.cs.washington.edu/~moreau/>`_\n\nAuto-tuning for a specific accelerator design is critical for getting the best\nperformance for any given operator. This is a tutorial showcases how to tune a\nwhole convolutional network on VTA.\n\nThe operator implementation for VTA in TVM is written in template form.\nThe template has many tunable knobs (tile factor, virtual threads, etc).\nWe will tune all convolution operators in the neural network. After tuning,\nwe produce a log file which stores the best schedule parameters for all tuned\noperators. When the TVM compiler compiles these operators, it will query this\nlog file to get the best knob parameters.\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Install dependencies\n--------------------\nTo use the autotvm package in tvm, we need to install some extra dependencies.\n(change \"3\" to \"2\" if you use python2):\n\n.. code-block:: bash\n\n  pip3 install --user psutil xgboost tornado mxnet requests \"Pillow<7\" cloudpickle\n\nTo make TVM run faster during tuning, it is recommended to use cython\nas FFI of TVM. In the root directory of TVM, execute\n(change \"3\" to \"2\" if you use python2):\n\n.. code-block:: bash\n\n  pip3 install --user cython\n  sudo make cython3\n\nNow return to python code. Import packages.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\nfrom mxnet.gluon.model_zoo import vision\nimport numpy as np\nfrom PIL import Image\n\nfrom tvm import topi\nimport tvm\nfrom tvm import te\nfrom tvm import rpc, autotvm, relay\nfrom tvm.contrib import graph_executor, utils, download\nfrom tvm.autotvm.measure.measure_methods import request_remote\nfrom tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner\n\nimport vta\nfrom vta.testing import simulator\nfrom vta.top import graph_pack"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Compile network\n---------------\nPerform vta-specific compilation with Relay from a Gluon model\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def compile_network(env, target, model, start_pack, stop_pack):\n\n    # Populate the shape and data type dictionary\n    dtype_dict = {\"data\": \"float32\"}\n    shape_dict = {\"data\": (env.BATCH, 3, 224, 224)}\n\n    # Get off the shelf gluon model, and convert to relay\n    gluon_model = vision.get_model(model, pretrained=True)\n    mod, params = relay.frontend.from_mxnet(gluon_model, shape_dict)\n\n    # Update shape and type dictionary\n    shape_dict.update({k: v.shape for k, v in params.items()})\n    dtype_dict.update({k: str(v.dtype) for k, v in params.items()})\n\n    # Perform quantization in Relay\n    # Note: We set opt_level to 3 in order to fold batch norm\n    with tvm.transform.PassContext(opt_level=3):\n        with relay.quantize.qconfig(global_scale=8.0, skip_conv_layers=[0]):\n            mod = relay.quantize.quantize(mod, params=params)\n\n    # Perform graph packing and constant folding for VTA target\n    if target.device_name == \"vta\":\n        assert env.BLOCK_IN == env.BLOCK_OUT\n        relay_prog = graph_pack(\n            mod[\"main\"],\n            env.BATCH,\n            env.BLOCK_OUT,\n            env.WGT_WIDTH,\n            start_name=start_pack,\n            stop_name=stop_pack,\n        )\n\n    return relay_prog, params"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Start RPC Tracker\n-----------------\nTVM uses an RPC session to communicate with Pynq boards.\nDuring tuning, the tuner will send the generated code to the board and\nmeasure the speed of code on the board.\n\nTo scale up tuning, TVM uses an RPC Tracker to manage multiple devices.\nThe RPC Tracker is a centralized controller node. We can register all devices to\nthe tracker. For example, if we have 10 Pynq boards, we can register all of them\nto the tracker, and run 10 measurements in parallel, accelerating the tuning process.\n\nTo start an RPC tracker, run this command on the host machine. The tracker is\nrequired during the whole tuning process, so we need to open a new terminal for\nthis command:\n\n.. code-block:: bash\n\n  python -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190\n\nThe expected output is:\n\n.. code-block:: bash\n\n  INFO:RPCTracker:bind to 0.0.0.0:9190\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Register devices to RPC Tracker\n-----------------------------------\nNow we can register our devices to the tracker. The first step is to\nbuild the TVM runtime for the Pynq devices.\n\nFollow `vta-index`\nto build the TVM runtime on the device. Then register the device to the tracker with:\n\n.. code-block:: bash\n\n  python -m tvm.exec.rpc_server --tracker=[HOST_IP]:9190 --key=pynq\n\n(replace :code:`[HOST_IP]` with the IP address of your host machine)\n\nAfter registering devices, we can confirm it by querying the rpc_tracker:\n\n.. code-block:: bash\n\n  python -m tvm.exec.query_rpc_tracker --host=0.0.0.0 --port=9190\n\nFor example, if we have 6 Pynq boards and 11 Raspberry Pi 3B,\nthe output can be\n\n.. code-block:: bash\n\n   Queue Status\n   ----------------------------------\n   key          total  free  pending\n   ----------------------------------\n   pynq         6      6     0\n   rpi3b        11     11    0\n   ----------------------------------\n\nYou can register multiple devices to the tracker to accelerate tuning.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Set Tuning Options\n------------------\nBefore tuning, we should apply some configurations.\nHere we use an Pynq-Z1 board as an example.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Tracker host and port can be set by your environment\ntracker_host = os.environ.get(\"TVM_TRACKER_HOST\", \"127.0.0.1\")\ntracker_port = int(os.environ.get(\"TVM_TRACKER_PORT\", 9190))\n\n# Load VTA parameters from the 3rdparty/vta-hw/config/vta_config.json file\nenv = vta.get_env()\n\n# This target is used for cross compilation. You can query it by :code:`gcc -v` on your device.\n# Set ``device=arm_cpu`` to run inference on the CPU\n# or ``device=vta`` to run inference on the FPGA.\ndevice = \"vta\"\ntarget = env.target if device == \"vta\" else env.target_vta_cpu\n\n# Name of Gluon model to compile\n# The ``start_pack`` and ``stop_pack`` labels indicate where\n# to start and end the graph packing relay pass: in other words\n# where to start and finish offloading to VTA.\nnetwork = \"resnet18_v1\"\nstart_pack = \"nn.max_pool2d\"\nstop_pack = \"nn.global_avg_pool2d\"\n\n# Tuning option\nlog_file = \"%s.%s.log\" % (device, network)\ntuning_option = {\n    \"log_filename\": log_file,\n    \"tuner\": \"random\",\n    \"n_trial\": 1000,\n    \"early_stopping\": None,\n    \"measure_option\": autotvm.measure_option(\n        builder=autotvm.LocalBuilder(),\n        runner=autotvm.RPCRunner(\n            env.TARGET,\n            host=tracker_host,\n            port=tracker_port,\n            number=5,\n            timeout=60,\n            module_loader=vta.module_loader(),\n            # check_correctness=True, # TODO: re-enable when check_correctness works again.\n        ),\n    ),\n}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Note</h4><p>How to set tuning options\n\n  In general, the default values provided here work well.\n  If you have enough time budget, you can set :code:`n_trial`, :code:`early_stopping`\n  to larger values, makes the tuning run for longer.\n  If your device is under-powered or your conv2d operators are large, consider\n  setting a longer timeout.</p></div>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Begin Tuning\n------------\nNow we can extract tuning tasks from the network and begin tuning.\nHere, we provide a simple utility function to tune a list of tasks.\nThis function is just an initial implementation which tunes them in sequential order.\nWe will introduce a more sophisticated tuning scheduler in the future.\n\nGiven that the tuning will be done on Pynq FPGA boards, make sure that\nthe ```TARGET`` entry in the ``vta_config.json`` file is set to ``pynq``.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# You can skip the implementation of this function for this tutorial.\ndef tune_tasks(\n    tasks,\n    measure_option,\n    tuner=\"xgb\",\n    n_trial=1000,\n    early_stopping=None,\n    log_filename=\"tuning.log\",\n    use_transfer_learning=True,\n):\n\n    # create tmp log file\n    tmp_log_file = log_filename + \".tmp\"\n    if os.path.exists(tmp_log_file):\n        os.remove(tmp_log_file)\n\n    for i, tsk in enumerate(reversed(tasks)):\n        prefix = \"[Task %2d/%2d] \" % (i + 1, len(tasks))\n\n        # create tuner\n        if tuner == \"xgb\" or tuner == \"xgb-rank\":\n            tuner_obj = XGBTuner(tsk, loss_type=\"rank\")\n        elif tuner == \"xgb_knob\":\n            tuner_obj = XGBTuner(tsk, loss_type=\"rank\", feature_type=\"knob\")\n        elif tuner == \"ga\":\n            tuner_obj = GATuner(tsk, pop_size=50)\n        elif tuner == \"random\":\n            tuner_obj = RandomTuner(tsk)\n        elif tuner == \"gridsearch\":\n            tuner_obj = GridSearchTuner(tsk)\n        else:\n            raise ValueError(\"Invalid tuner: \" + tuner)\n\n        if use_transfer_learning:\n            if os.path.isfile(tmp_log_file):\n                tuner_obj.load_history(autotvm.record.load_from_file(tmp_log_file))\n\n        # do tuning\n        tsk_trial = min(n_trial, len(tsk.config_space))\n        tuner_obj.tune(\n            n_trial=tsk_trial,\n            early_stopping=early_stopping,\n            measure_option=measure_option,\n            callbacks=[\n                autotvm.callback.progress_bar(tsk_trial, prefix=prefix),\n                autotvm.callback.log_to_file(tmp_log_file),\n            ],\n        )\n\n    # pick best records to a cache file\n    autotvm.record.pick_best(tmp_log_file, log_filename)\n    os.remove(tmp_log_file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Register VTA-specific tuning tasks\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def register_vta_tuning_tasks():\n    from tvm.autotvm.task import TaskExtractEnv\n\n    @tvm.te.tag_scope(tag=topi.tag.ELEMWISE)\n    def my_clip(x, a_min, a_max):\n        \"\"\"Unlike topi's current clip, put min and max into two stages.\"\"\"\n        const_min = tvm.tir.const(a_min, x.dtype)\n        const_max = tvm.tir.const(a_max, x.dtype)\n        x = te.compute(x.shape, lambda *i: tvm.te.min(x(*i), const_max), name=\"clipA\")\n        x = te.compute(x.shape, lambda *i: tvm.te.max(x(*i), const_min), name=\"clipB\")\n        return x\n\n    # init autotvm env to register VTA operator\n    TaskExtractEnv()\n\n    @autotvm.template(\"conv2d_packed.vta\")\n    def _topi_nn_conv2d(*args, **kwargs):\n        assert not kwargs, \"Do not support kwargs in template function call\"\n        A, W = args[:2]\n\n        with tvm.target.vta():\n            res = vta.top.conv2d_packed(*args, **kwargs)\n            res = topi.right_shift(res, 8)\n            res = my_clip(res, 0, 127)\n            res = topi.cast(res, \"int8\")\n\n        if tvm.target.Target.current().device_name == \"vta\":\n            s = vta.top.schedule_conv2d_packed([res])\n        else:\n            s = te.create_schedule([res.op])\n        return s, [A, W, res]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we launch tuning jobs and evaluate the end-to-end performance.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def tune_and_evaluate(tuning_opt):\n\n    # Register VTA tuning tasks\n    register_vta_tuning_tasks()\n\n    # Perform task extraction on Relay program\n    print(\"Extract tasks...\")\n    relay_prog, params = compile_network(env, target, network, start_pack, stop_pack)\n    mod = tvm.IRModule.from_expr(relay_prog)\n    tasks = autotvm.task.extract_from_program(\n        mod,\n        params=params,\n        ops=(relay.op.get(\"nn.conv2d\"),),\n        target=target,\n        target_host=env.target_host,\n    )\n\n    # filter out non-packed conv2d task\n    tasks = list(filter(lambda t: len(t.args[0][1]) > 4 and \"conv\" in t.name, tasks))\n\n    # We should have extracted 10 convolution tasks\n    assert len(tasks) == 10\n    print(\"Extracted {} conv2d tasks:\".format(len(tasks)))\n    for tsk in tasks:\n        inp = tsk.args[0][1]\n        wgt = tsk.args[1][1]\n        batch = inp[0] * inp[4]\n        in_filter = inp[1] * inp[5]\n        out_filter = wgt[0] * wgt[4]\n        height, width = inp[2], inp[3]\n        hkernel, wkernel = wgt[2], wgt[3]\n        hstride, wstride = tsk.args[2][0], tsk.args[2][1]\n        hpad, wpad = tsk.args[3][0], tsk.args[3][1]\n        print(\n            \"({}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {})\".format(\n                batch,\n                height,\n                width,\n                in_filter,\n                out_filter,\n                hkernel,\n                wkernel,\n                hpad,\n                wpad,\n                hstride,\n                wstride,\n            )\n        )\n\n    # We do not run the tuning in our webpage server since it takes too long.\n    # Comment the following line to run it by yourself.\n    return\n\n    # run tuning tasks\n    print(\"Tuning...\")\n    tune_tasks(tasks, **tuning_opt)\n\n    # evaluate with tuning history\n    if env.TARGET != \"sim\":\n        # Get remote from fleet node\n        remote = autotvm.measure.request_remote(\n            env.TARGET, tracker_host, tracker_port, timeout=10000\n        )\n        # Reconfigure the JIT runtime and FPGA.\n        vta.reconfig_runtime(remote)\n        vta.program_fpga(remote, bitstream=None)\n    else:\n        # In simulation mode, host the RPC server locally.\n        remote = rpc.LocalSession()\n\n    # compile kernels with history best records\n    with autotvm.tophub.context(target, extra_files=[log_file]):\n        # Compile network\n        print(\"Compile...\")\n        if target.device_name != \"vta\":\n            with tvm.transform.PassContext(opt_level=3, disabled_pass={\"AlterOpLayout\"}):\n                lib = relay.build(\n                    relay_prog, target=target, params=params, target_host=env.target_host\n                )\n        else:\n            with vta.build_config(opt_level=3, disabled_pass={\"AlterOpLayout\"}):\n                lib = relay.build(\n                    relay_prog, target=target, params=params, target_host=env.target_host\n                )\n\n        # Export library\n        print(\"Upload...\")\n        temp = utils.tempdir()\n        lib.export_library(temp.relpath(\"graphlib.tar\"))\n        remote.upload(temp.relpath(\"graphlib.tar\"))\n        lib = remote.load_module(\"graphlib.tar\")\n\n        # Generate the graph executor\n        ctx = remote.ext_dev(0) if device == \"vta\" else remote.cpu(0)\n        m = graph_executor.GraphModule(lib[\"default\"](ctx))\n\n        # upload parameters to device\n        image = tvm.nd.array((np.random.uniform(size=(1, 3, 224, 224))).astype(\"float32\"))\n        m.set_input(\"data\", image)\n\n        # evaluate\n        print(\"Evaluate inference time cost...\")\n        timer = m.module.time_evaluator(\"run\", ctx, number=1, repeat=10)\n        tcost = timer()\n        prof_res = np.array(tcost.results) * 1000  # convert to millisecond\n        print(\n            \"Mean inference time (std dev): %.2f ms (%.2f ms)\"\n            % (np.mean(prof_res), np.std(prof_res))\n        )\n\n\n# Run the tuning and evaluate the results\ntune_and_evaluate(tuning_option)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Sample Output\n-------------\nThe tuning needs to compile many programs and extract feature from them.\nSo a high performance CPU is recommended.\nOne sample output is listed below.\nIt takes about 2 hours on a 16T CPU, and 6 Pynq boards.\n\n.. code-block:: bash\n\n   Extract tasks...\n   [Warning] Invalid shape during AutoTVM task creation\n   Extracted 10 conv2d tasks:\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 16, 14, 14, 1, 16), 'int8'), ('TENSOR', (32, 16, 1, 1, 16, 16), 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 16, 14, 14, 1, 16, 'int8'), (32, 16, 1, 1, 16, 16, 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 8, 28, 28, 1, 16), 'int8'), ('TENSOR', (16, 8, 1, 1, 16, 16), 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 8, 28, 28, 1, 16, 'int8'), (16, 8, 1, 1, 16, 16, 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 4, 56, 56, 1, 16), 'int8'), ('TENSOR', (8, 4, 1, 1, 16, 16), 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 4, 56, 56, 1, 16, 'int8'), (8, 4, 1, 1, 16, 16, 'int8'), (2, 2), (0, 0), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 4, 56, 56, 1, 16), 'int8'), ('TENSOR', (4, 4, 3, 3, 16, 16), 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 4, 56, 56, 1, 16, 'int8'), (4, 4, 3, 3, 16, 16, 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 8, 28, 28, 1, 16), 'int8'), ('TENSOR', (8, 8, 3, 3, 16, 16), 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 8, 28, 28, 1, 16, 'int8'), (8, 8, 3, 3, 16, 16, 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 4, 56, 56, 1, 16), 'int8'), ('TENSOR', (8, 4, 3, 3, 16, 16), 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 4, 56, 56, 1, 16, 'int8'), (8, 4, 3, 3, 16, 16, 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 16, 14, 14, 1, 16), 'int8'), ('TENSOR', (16, 16, 3, 3, 16, 16), 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 16, 14, 14, 1, 16, 'int8'), (16, 16, 3, 3, 16, 16, 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 8, 28, 28, 1, 16), 'int8'), ('TENSOR', (16, 8, 3, 3, 16, 16), 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 8, 28, 28, 1, 16, 'int8'), (16, 8, 3, 3, 16, 16, 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 32, 7, 7, 1, 16), 'int8'), ('TENSOR', (32, 32, 3, 3, 16, 16), 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 32, 7, 7, 1, 16, 'int8'), (32, 32, 3, 3, 16, 16, 'int8'), (1, 1), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n       Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 16, 14, 14, 1, 16), 'int8'), ('TENSOR', (32, 16, 3, 3, 16, 16), 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'), kwargs={}, workload=('conv2d', (1, 16, 14, 14, 1, 16, 'int8'), (32, 16, 3, 3, 16, 16, 'int8'), (2, 2), (1, 1), (1, 1), 'NCHW1n16c', 'int32'))\n   Tuning...\n   [Task  1/10]  Current/Best:    0.72/  23.24 GFLOPS | Progress: (480/1000) | 640.31 s Done.\n   [Task  2/10]  Current/Best:    0.00/  27.69 GFLOPS | Progress: (576/1000) | 810.09 s Done.\n   [Task  3/10]  Current/Best:    0.00/  22.97 GFLOPS | Progress: (1000/1000) | 1125.37 s Done.\n   [Task  4/10]  Current/Best:    0.00/  31.26 GFLOPS | Progress: (1000/1000) | 1025.52 s Done.\n   [Task  5/10]  Current/Best:    0.00/  15.15 GFLOPS | Progress: (1000/1000) | 1236.58 s Done.\n   [Task  6/10]  Current/Best:    0.00/  22.74 GFLOPS | Progress: (1000/1000) | 906.60 s Done.\n   [Task  7/10]  Current/Best:    0.00/  15.27 GFLOPS | Progress: (1000/1000) | 1056.25 s Done.\n   [Task  8/10]  Current/Best:    0.00/   2.18 GFLOPS | Progress: (1000/1000) | 2275.29 s Done.\n   [Task  9/10]  Current/Best:    2.23/   3.99 GFLOPS | Progress: (1000/1000) | 2527.25 s Done.\n   [Task 10/10]  Current/Best:    1.56/   6.32 GFLOPS | Progress: (480/1000) | 1304.84 s Done.\n   Compile...\n   Upload...\n   Evaluate inference time cost...\n   Mean inference time (std dev): 621.79 ms (0.14 ms)\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Note</h4><p>**Experiencing Difficulties?**\n\n  The auto tuning module is error-prone. If you always see \" 0.00/ 0.00 GFLOPS\",\n  then there must be something wrong.\n\n  First, make sure you set the correct configuration of your device.\n  Then, you can print debug information by adding these lines in the beginning\n  of the script. It will print every measurement result, where you can find useful\n  error messages.\n\n  .. code-block:: python\n\n     import logging\n     logging.getLogger('autotvm').setLevel(logging.DEBUG)\n\n  Finally, always feel free to ask our community for help on https://discuss.tvm.apache.org</p></div>\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}